Recommending Library Methods: An Evaluation of the Vector Space Model (VSM) and Latent Semantic Indexing (LSI)
نویسندگان
چکیده
The development and maintenance of a reuse repository requires significant investment, planning and managerial support. To minimise risk and ensure a healthy return on investment, reusable components should be accessible, reliable and of a high quality. In this paper we concentrate on accessability; we describe a technique which enables a developer to effectively and conveniently make use of large scale libraries. Unlike most previous solutions to component retrieval, our tool, RASCAL, is a proactive component recommender. RASCAL recommends a set of task-relevant reusable components to a developer. Recommendations are produced using Collaborative Filtering (CF). We compare and contrast CF effectiveness when using two information retrieval techniques, namely Vector Space Model (VSM) and Latent Semantic Indexing (LSI). We validate our technique on real world examples and find overall results are encouraging; notably, RASCAL can produce reasonably good recommendations when they are most valuable i.e., at an early stage in code development.
منابع مشابه
On the Performance of Latent Semantic Indexing based Information Retrieval
Conventional vector-based Information Retrieval (IR) models: Vector Space Model (VSM) and Generalized Vector Space Model (GVSM) represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands on computing resources. To overcome these problems, Latent Semantic Indexing (LSI), a variant of VSM, projects the documents into a lower dimensiona...
متن کاملAnalysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval
Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However, both LSI and FCA use the data represen...
متن کاملLatent Semantic Indexing for Patent Documents
Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We ...
متن کاملLatent Semantic Indexing Based on Factor Analysis
The main purpose of this paper is to propose a novel latent semantic indexing (LSI), statistical approach to simultaneously mapping documents and terms into a latent semantic space. This approach can index documents more effectively than the vector space model (VSM). Latent semantic indexing (LSI), which is based on singular value decomposition (SVD), and probabilistic latent semantic indexing ...
متن کاملApproximate Dimension Equalization in Vector-based Information Retrieval
Vector-based information retrieval methods such as the vector space model (VSM), latent semantic indexing (LSI), and the generalized vector space model (GVSM) represent both queries and documents by high-dimensional vectors learned from analyzing a training corpus of text. VSM scales well to large collections, but cannot represent term–term correlations, which prevents it from being used in tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006